-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Alerting] Updating AlertsClient
to provide feature parity with rule registry lifecycle executor
#160466
Conversation
cd2a9d9
to
812e0e1
Compare
…ting alerts client to return alert info to rule executors
4f30554
to
c82a20d
Compare
AlertsClient
to provide feature parity with rule registry lifecycle executor
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only reviewed the code, not the tests, nor done a live test - which I will :-)
However, thought I'd send comments from the code. Specifically interested in the state.start
thing issue / PR ...
@@ -76,6 +77,10 @@ export class Alert< | |||
return this.meta.uuid!; | |||
} | |||
|
|||
getStart(): string | null { | |||
return this.state.start ? (this.state.start as string) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like ${this.state.start}
would be safer than this.state.start as string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 425c34f
...(legacyAlert.getState().duration | ||
? { duration: { us: legacyAlert.getState().duration } } | ||
: {}), | ||
...(legacyAlert.getState().start ? { start: legacyAlert.getState().start } : {}), | ||
...(legacyAlert.getState().start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm ... I totally forgot about our issue with start
, end
and duration
in state
: #144929 ; the PR to fix this never got merged, I'd guess it's in a bad state by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we never merged that PR :(
@@ -96,17 +96,22 @@ export interface TrackedAlerts< | |||
recovered: Record<string, LegacyAlert<State, Context>>; | |||
} | |||
|
|||
// allows Partial on nested objects | |||
export type RecursivePartial<T> = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is still needed, but for some reason I extended this pattern a bit in event log to handle arrays:
kibana/x-pack/plugins/event_log/generated/schemas.ts
Lines 19 to 21 in fd1bad1
type DeepPartial<T> = { | |
[P in keyof T]?: T[P] extends Array<infer U> ? Array<DeepPartial<U>> : DeepPartial<T[P]>; | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out they're everywhere! There's a DeepPartial
in @kbn/utility-types
that I ended up using. Updated in 425c34f
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI Steps
Test Failures
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
I'm seeing a difference between the "old" (without the commit that has mt use framework) and "new" (with the commit that has mt use framework). In "old", the JSON doc is how I would expect: "_source": {
"kibana": {
"alert": {
"evaluation": {
"values": [
10.8
]
},
"reason": "kibana.alert.rule.execution.metrics.rule_type_run_duration_ms is 10.8 in the last 1 min. Alert when < 100.",
"action_group": "recovered",
"flapping": false,
... // rest here
}
}
} in "new", it's flattened: "_source": {
"kibana.alert.uuid": "a15920f4-4f9c-48fd-8cde-8a6439f425b7",
"kibana.alert.status": "recovered",
"kibana.alert.workflow_status": "open",
"event.kind": "signal",
"event.action": "close",
"kibana.version": "8.10.0",
"kibana.alert.flapping": true,
} Given conversation yesterday, I believe the |
The rule registry alert doc is actually the flattened version while the alerting framework generated doc is unflattened. Mike and I discussed this difference and decided it was ok since it shouldn't break any queries and it shows up in the alerts table ok (and as you said, using _source is not recommended) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Tested it, and it appears the new alert docs have flattened _source
, but the old docs have object structures. Noted that in a separate comment though, and seems like it's probably fine.
@@ -1750,12 +1795,18 @@ describe('Alerts Client', () => { | |||
tags: ['rule-', '-tags'], | |||
uuid: '1', | |||
}, | |||
start: '2023-03-28T12:27:28.159Z', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why you changed the date fromr 11 to 12! Everything looks fine, just wondering :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know! I think I got confused and thought that 12:27 was the hard-coded value I have Date.now() resolve to (which is actually 22:27) so I wanted to make sure the start time was different. Too many numbers!
Resolves #160173
Summary
The rule registry lifecycle executor automatically sets the following fields in alert docs:
event.action
-open
,active
orclose
depending on what type of alertevent.kind
- alwayssignal
tags
- merges rule tags with rule executor reported tagskibana.version
kibana.alert.workflow_status
- set toopen
kibana.alert.time_range
In addition, the rule registry lifecycle executor provides some helper functions for the rule executors to get the alert UUID, the alert start time (if it exists) and the alert document for recovered alerts (used to set recovered context variables).
This PR augments the framework
AlertsClient
to set the same fields and to provide the same functionality to the rule executors. When an alert is reported via theAlertsClient
, the UUID (either existing or newly generated) and the start time (for ongoing alerts) is returned back to the rule executors. When an executor requests the recovered alerts in order to set context information, the existing alert document is returned.To Verify
Check out this commit which removes the metric threshold rule from the rule registry lifecycle executor and onboards it to use the framework alerts client. Create a metric threshold rule that creates active alerts and recovers alerts. Inspect the alerts documents to make sure all the fields mentioned above exist. Compare these documents with alerts created using the lifecycle executor.